influence-augmented online planning
Influence-Augmented Online Planning for Complex Environments
How can we plan efficiently in real time to control an agent in a complex environment that may involve many other agents? While existing sample-based planners have enjoyed empirical success in large POMDPs, their performance heavily relies on a fast simulator. However, real-world scenarios are complex in nature and their simulators are often computationally demanding, which severely limits the performance of online planners. In this work, we propose influence-augmented online planning, a principled method to transform a factored simulator of the entire environment into a local simulator that samples only the state variables that are most relevant to the observation and reward of the planning agent and captures the incoming influence from the rest of the environment using machine learning methods. Our main experimental results show that planning on this less accurate but much faster local simulator with POMCP leads to higher real-time planning performance than planning on the simulator that models the entire environment.
Review for NeurIPS paper: Influence-Augmented Online Planning for Complex Environments
Weaknesses: The major concern is that the idea of exploiting "influences" of domain variables to reduce the state space of POMDPs is not new. In the literature, those variables that only indirectly influence agent behaviors are referred to as exogenous variables. The following are two papers that studied this idea. The RNN-based influence learning is new within the literature, while the following two papers have studied other reasoning and learning methods to incorporate exogenous variables into POMDP-based action selection processes. Zhang S, Khandelwal P, Stone P. Dynamically constructed (PO) MDPs for adaptive robot planning.
Influence-Augmented Online Planning for Complex Environments
How can we plan efficiently in real time to control an agent in a complex environment that may involve many other agents? While existing sample-based planners have enjoyed empirical success in large POMDPs, their performance heavily relies on a fast simulator. However, real-world scenarios are complex in nature and their simulators are often computationally demanding, which severely limits the performance of online planners. In this work, we propose influence-augmented online planning, a principled method to transform a factored simulator of the entire environment into a local simulator that samples only the state variables that are most relevant to the observation and reward of the planning agent and captures the incoming influence from the rest of the environment using machine learning methods. Our main experimental results show that planning on this less accurate but much faster local simulator with POMCP leads to higher real-time planning performance than planning on the simulator that models the entire environment.